Back

IEEE Journal of Biomedical and Health Informatics

Institute of Electrical and Electronics Engineers (IEEE)

All preprints, ranked by how well they match IEEE Journal of Biomedical and Health Informatics's content profile, based on 34 papers previously published here. The average preprint has a 0.08% match score for this journal, so anything above that is already an above-average fit. Older preprints may already have been published elsewhere.

1
PAMG-AT: A Physiological Attention Multi-Graph Model with Adaptive Topology for Stress Detection using Wearable Devices

YILDIZ, O.; Subasi, A.

2026-03-05 bioinformatics 10.64898/2026.03.02.709179 medRxiv
Top 0.1%
68.6%
Show abstract

Stress detection with wearable physiological sensors is vital in digital health and affective computing. Conventional machine learning techniques usually examine physiological signals separately, missing the intricate inter-signal connections involved in the human stress response. While deep neural networks offer high accuracy, they function as black boxes, offering minimal understanding of the physiological processes behind stress detection. This study introduces a hierarchical graph neural network framework for WESAD stress detection, establishing a methodology for affective computing that emphasizes interpretability and extensibility while maintaining strong predictive performance. We proposed PAMG-AT (Physiological Attention Multi-Graph with Adaptive Topology) which is a hierarchical graph neural network architecture, for stress detection using multimodal physiological signals. In this framework, physiological features serve as nodes within a knowledge-driven graph, while edges represent established physiological relationships, including cardiac-electrodermal coupling and cardio-respiratory interaction. The architecture employs a three-level attention mechanism: spatial encoding via Graph Attention Networks (GAT) to assess feature importance, temporal modeling with a Transformer to capture dynamics across time windows, and global pooling for classification. The model is evaluated using three sensor configurations (chest-only, wrist-only, and hybrid) on the WESAD dataset, employing rigorous Leave-One-Subject-Out (LOSO) cross-validation. PAMG-AT achieves competitive performance, with 94.59% accuracy ({+/-}6.8%) for chest sensors, 91.76% ({+/-}9.2%) for wrist sensors, and 92.80% ({+/-}8.33%) for the hybrid configuration. The proposed method provides interpretability via attention weights, revealing that ECG-EDA relationships (cardiac-electrodermal coupling) are most predictive of stress. Three low-responder subjects (S2, S3, S9) with atypical physiological stress patterns demonstrate lower accuracy (81-87%), offering clinically valuable insights for personalized stress management. The effective wrist-only configuration, achieving 91.76% accuracy, supports practical deployment in consumer wearables.

2
Multi-Source Graph Synthesis (MUGS) for Pediatric Knowledge Graphs from Electronic Health Records

Li, M.; Li, X.; Pan, K.; Geva, A.; Yang, D.; Sweet, S. M.; Bonzel, C.-L.; Panickan, V. A.; Xiong, X.; Mandl, K. D.; Cai, T.

2024-01-16 pediatrics 10.1101/2024.01.14.24301302 medRxiv
Top 0.1%
49.0%
Show abstract

The wealth of valuable real-world medical data found within Electronic Health Record (EHR) systems is particularly significant in the field of pediatrics, where conventional clinical studies face notably high barriers. However, constructing accurate knowledge graphs from pediatric EHR data is challenging due to its limited content density compared to EHR data for the general population. Additionally, knowledge graphs built from EHR data primarily covering adult patients may not suit the unique biomedical characteristics of pediatric patients. In this research, we introduce a graph transfer learning approach aimed at constructing precise pediatric knowledge graphs. We present MUlti-source Graph Synthesis (MUGS), an algorithm designed to create embeddings for pediatric EHR codes by leveraging information from three distinct sources: (1) pediatric EHR data, (2) EHR data from the general population, and (3) existing hierarchical medical ontology knowledge shared across different patient populations. We break down these code embeddings into shared and unshared components, facilitating the adaptive and robust capture of varying levels of heterogeneity across different medical sites through meticulous hyperparameter tuning. We assessed the quality of these code embeddings in recognizing established relationships among pediatric codes, as curated from credible online sources, pediatric physicians, or GPT. Furthermore, we developed a web API for visualizing pediatric knowledge graphs generated using MUGS embeddings and devised a phenotyping algorithm to identify patients with characteristics similar to a given profile, with a specific focus on pediatric pulmonary hypertension (PH). The MUGS-generated embeddings demonstrated resilience against negative transfer and exhibited superior performance across all three tasks when compared to pediatric-only approaches, multi-site pooling, and semantic-based methods. MUGS embeddings open up new avenues for evidence-based pediatric research utilizing EHR data.

3
Accurate prediction of neurologic changes in critically ill infants using pose AI

Gleason, A.; Richter, F.; Beller, N.; Arivazhagan, N.; Feng, R.; Holmes, E.; Glicksberg, B. S.; Morton, S. U.; La Vega-Talbott, M.; Fields, M.; Guttmann, K.; Nadkarni, G. N.; Richter, F.

2024-04-19 pediatrics 10.1101/2024.04.17.24305953 medRxiv
Top 0.1%
47.9%
Show abstract

Infant alertness and neurologic changes can reflect life-threatening pathology but are assessed by exam, which can be intermittent and subjective. Reliable, continuous methods are needed. We hypothesized that our computer vision method to track movement, pose AI, could predict neurologic changes in the neonatal intensive care unit (NICU). We collected 4,705 hours of video linked to electroencephalograms (EEG) from 115 infants. We trained a deep learning pose algorithm that accurately predicted anatomic landmarks in three evaluation sets (ROC-AUCs 0.83-0.94), showing feasibility of applying pose AI in an ICU. We then trained classifiers on landmarks from pose AI and observed high performance for sedation (ROC-AUCs 0.87-0.91) and cerebral dysfunction (ROC-AUCs 0.76-0.91), demonstrating that an EEG diagnosis can be predicted from video data alone. Taken together, deep learning with pose AI may offer a scalable, minimally invasive method for neuro-telemetry in the NICU.

4
A Lightweight, End-to-End Explainable, and Generalized attention-based graph neural network to Classify Autism Spectrum Disorder using Meta-Connectivity

Bhavna, K.; Ghosh, N.; Banerjee, R.; Roy, D.

2024-07-18 health informatics 10.1101/2024.07.17.24310610 medRxiv
Top 0.1%
43.1%
Show abstract

1Recent technological advancement in Graph Neural Networks (GNNs) have been extensively used to diagnose brain disorders such as autism (ASD), which is associated with deficits in social communication, interaction, and restricted/repetitive behaviors. However, the existing machine-learning/deep-learning (ML/DL) models suffer from low accuracy and explainability due to their internal architecture and feature extraction techniques, which also predominantly focus on node-centric features. As a result, performance is moderate on unseen data due to ignorance of edge-centric features. Here, we argue that meaningful features and information can be extracted by focusing on meta connectivity between large-scale brain networks which is an edge-centric higher order dynamic correlation in time. In the current study, we have proposed a novel explainable and generalized node-edge connectivity-based graph attention neural network(Ex-NEGAT) model to classify ASD subjects from neuro-typicals (TD) on unseen data using a node edge-centric feature set for the first time and predicted their symptom severity scores. We used ABIDE (I and II) dataset with a large sample size (Total no. of samples = 1500). The framework employs meta-connectivity derived from Theory-of-Mind (ToM), Default-mode Network (DMN), Central executive (CEN), and Salience network (SN) that measure the dynamic functional connectivity (dFC) as a flow across morphing connectivity configurations. To generalize the Ex-NEGAT model, we trained the proposed model on ABIDE I(No. of samples =840) and performed testing on the ABIDE II(no. of samples =660) dataset and achieved 88% accuracy with an F1-score of 0.89. Additionally, we identified symptom severity scores for each individual subjects using their meta-connectivity links between relevant brain networks and passing that to Connectome-based Prediction Modelling (CPM) pipeline to identify the specific large-scale brain networks whose edge connectivity contributed positively and negatively to the prediction. Our approach accurately predicted ADOS-Total, ADOS-Social, ADOS-Communication, ADOS-Module, ADOS-STEREO, and FIQ scores.

5
Deep AI model for autism detection using naturalistic behavioral videos

Chen, G.; Zhou, W.; Zhang, L.; Ji, Y.; Zhang, Q.; Ren, T.; Tan, H.; Chen, J.; Liu, K.; Song, X.; Huang, S.; Gu, L.; Liu, J.; Wang, H.; Sui, G.; Wang, Y.; Han, X.; Wang, W.; Li, F.

2025-10-09 pediatrics 10.1101/2025.10.01.25336912 medRxiv
Top 0.1%
42.1%
Show abstract

Profound heterogeneity in Autism Spectrum Disorder (ASD) complicates diagnosis and the development of effective treatments. In healthcare systems with limited specialist resources, the need for rapid and accessible screening tools is particularly urgent. In the present study, we developed an objective, scalable pipeline that pairs a simple two-minute video recording of a childs naturalistic behavior with a deep AI model for autism detection, representing the first such rapid, scalable framework. By analyzing a rich spectrum of childrens responses to social stimuli, such as subtle behavioral patterns, gaze dynamics, facial morphology, and dynamic facial complexity, the deep AI model provides powerful support for clinical workflows, demonstrating high accuracy in identifying ASD risk across diverse internal and external test cohorts, irrespective of sex, age, or cognitive function. Furthermore, a series of comprehensive analyses confirmed the models clinical relevance and revealed its capacity to objectively stratify ASD heterogeneity into neurobiologically distinct subtypes. This work establishes a highly efficient and objective framework for large-scale screening, providing a data-driven foundation to stratify heterogeneity and paving the way for the future development of targeted interventions.

6
Adaptive, Unlabeled and Real-time Approximate-Learning Platform (AURA) for Personalized Epileptic Seizure Forecasting

Yang, Y.; Truong, N. D.; Eshraghian, J. K.; Nikpour, A.; Kavehei, O.

2021-11-03 health informatics 10.1101/2021.09.30.21264287 medRxiv
Top 0.1%
40.2%
Show abstract

A high performance event detection system is all you need for some predictive studies. Here, we present AURA: an Adaptive forecasting model trained with Unlabeled, Real-time data using internally generated Approximate labels on-the-fly. By harnessing the correlated nature of time-series data, a pair of detection and prediction models are coupled together such that the detection model generates labels automatically, which are then used to train the prediction model. AURA relies on several simple principles and assumptions: (i) the performance of an event prediction/forecasting model in the target application remains below the performance of an event detection model, (ii) detected events are treated as weak labels and deemed reliable enough for online training of a predictive model, and (iii) system performance and/or system responsive feedback characteristics can be tuned for a subject-under-test. For example, in medical patient monitoring, this enables personalizing forecasting models. Seizure prediction is identified as an ideal test case of AURA, as pre-ictal brainwaves are patient-specific and tailoring models to individual patients can significantly improve forecasting performance. AURA is used to generate an individual forecasting model for 10 patients, showing an average relative improvement in sensitivity by 14.30% and reduction in false alarms by 19.61%. This paper presents a proof-of-concept for the feasibility of online transfer-learning on a stream of time-series neurophysiological data that pave the way towards a low-power neuromorphic neuromodulation system.

7
OICNet: A Neural Network for Online EEG Source Separation using Independent Component Analysis

Yeh, P.-T.; Tsai, A. C.; Hsieh, C.-Y.; Yang, C.-C.; Wei, C.-S.

2023-06-01 bioinformatics 10.1101/2023.05.29.542778 medRxiv
Top 0.1%
40.1%
Show abstract

Online source separation of EEG signals plays a crucial role in understanding and interpreting brain dynamics in real-time applications such as brain-computer interfaces (BCIs). In this paper, we propose OICNet, a novel neural network designed specifically for online EEG source separation using independent component analysis, aiming to address the challenges of real-time computational efficiency and reliable extraction of independent components from EEG data streams. The OICNet is trained on a loss function that integrates non-Gaussianity measurement and an orthogonality constraint to achieve effective decomposition of multi-channel EEG signals. We conducted comprehensive evaluation of OICNet on both task-related and task-free EEG datasets with comparison against conventional and network-based ICA counterparts. The results demonstrate that OICNet outperforms existing methods in terms of accuracy and computational efficiency. Overall, OICNet provides high-efficiency real-time EEG source separation capabilities and paves the way for advancements in deep-learning EEG processing in real-world BCI applications.

8
Data-Driven Early Prediction of Cerebral Palsy Using AutoML and interpretable kinematic features

Segado, M.; Prosser, L.; Duncan, A. F.; Johnson, M. J.; Kording, K. P.

2025-02-12 pediatrics 10.1101/2025.02.10.25322007 medRxiv
Top 0.1%
36.9%
Show abstract

Early identification of cerebral palsy (CP) remains a major challenge due to the reliance on expert assessments that are time-intensive and not scalable. Consequently, a range of studies have aimed at using machine learning to predict CP scores based on motion tracking,e.g. from video data. These studies generally predict clinical scores which are a proxy for CP risk. However, clinicians do not REALLY want to estimate scores, they want to estimate the patients risk of developing clinical symptoms. Here we present a data-driven machine-learning (ML) pipeline that extracts movement features from infant video based motion tracking and estimates CP risk using AutoML. Using AutoSklearn, our framework minimizes risk of overfitting by abstracting away researcher-driver hyperparameter optimization. Trained on movement data from 3- to 4-month-old infants, our classifier predicts a highly indicative clinical score (General Movements Assessment [GMA]) with an ROC-AUC of 0.78 on a held-out test set, indicating that kinematic movement features capture clinically relevant variability. Without retraining, the same model predicts the risk of cerebral palsy outcomes at later clinical follow-ups with an ROC-AUC of 0.74, demonstrating that early motor representations generalize to long-term neurodevelopmental risk. We employ pre-registered lock-box validation to ensure rig-orous performance evaluation. This study highlights the potential of AutoML-powered movement analytics for neurodevelopmental screening, demonstrating that data-driven feature extraction from movement trajectories can provide an interpretable and scalable approach to early risk assessment. By integrating pre-trained vision transformers, AutoML-driven model selection, and rigorous validation protocols, this work advances the use of video-derived movement features for scalable, data-driven clinical assessment, demonstrating how computational methods based on readily available data like infant videos can enhance early risk detection in neurodevelopmental disorders. CCS ConceptsO_LIComputing methodologies [->] Machine learning approaches; C_LIO_LIApplied computing [->] Health informatics. C_LI ACM Reference FormatMelanie Segado, Laura Prosser, Andrea F. Duncan, Michelle J. Johnson, and Konrad P. Kording.. Data-Driven Early Prediction of Cerebral Palsy Using AutoML and interpretable kinematic features. In. ACM, New York, NY, USA, 8 pages.

9
Predicting Mental and Psychomotor Delay in Very Pre-term Infants using Large Language Models

Huang, Z.; Flory, M. J.; Kittler, P. M.; Phan, H. T.; Demirci, G. M.; Gordon, A. D.; Parab, S. M.; Tsai, C.-L.

2025-08-02 pediatrics 10.1101/2025.07.31.25332524 medRxiv
Top 0.1%
34.8%
Show abstract

Very preterm infants face a considerably higher risk of neurodevelopmental delays, making early diagnosis and timely intervention crucial for improving long-term outcomes. In this study, we utilized large language models (LLMs) to predict mental and psychomotor delays at 25 months using maternal and perinatal records combined with longitudinal features up to 22 months of age. The LLMs were employed to generate natural language descriptions for each infant, which were then used as input for a language model-based classifier to perform predictions. Our model achieved a 4.2% increase in AUCROC in mental delay prediction and 3.2% increase in psychomotor delay prediction 3 months before the 25-month assessment, compared to a random forest-based model for numerical tabular data only. These findings highlight the potential of LLMs as powerful tools for assessing the risk of neurodevelopmental delays in preterm infants.

10
CoRhythMo: A Computational Framework for Modeling Biobehavioral Rhythms from Mobile and Wearable Data Streams

Yan, R.; Liu, X.; Dutcher, J.; Tumminia, M.; Villalba, D.; Cohen, S.; Creswell, D.; Creswell, K.; Mankoff, J.; Dey, A.; Doryab, A.

2020-08-10 bioinformatics 10.1101/2020.08.10.244020 medRxiv
Top 0.1%
33.7%
Show abstract

This paper presents CoRhythMo, the first computational framework for modeling biobehavioral rhythms - the repeating cycles of physiological, psychological, social, and environmental events - from mobile and wearable data streams. The framework incorporates four main components: mobile data processing, rhythm discovery, rhythm modeling, and machine learning. We use a dataset of smartphone and Fitbit data collected from 138 college students over a semester to evaluate the frameworks ability to 1) model biobehavioral rhythms of students, 2) measure the stability of their rhythms over the course of the semester, 3) model differences between rhythms of students with different health status, and 4) predict the mental health status in students using the model of their biobehavioral rhythms. Our evaluation provides evidence for the feasibility of using CoRhythMo for modeling and discovering human rhythms and using them to assess and predict different life and health outcomes.

11
Empirical Review of LLM-driven Classification of Multidimensional Sleep Health Mentions from Free-Text Clinical Notes

Hussain, S.-A.; Calloway, A.; Sirrianni, J.; Fosler-Lussier, E.; Davenport, M.

2025-06-05 pediatrics 10.1101/2025.06.04.25328983 medRxiv
Top 0.1%
33.1%
Show abstract

Accurate multidimensional sleep health (MSH) information is often fragmented and inconsistently represented within hospital infrastructures, leaving crucial details buried in unstructured clinical notes rather than discrete fields. This inconsistency complicates large-scale phenotyping, secondary analyses, and clinical decision support regarding sleep-related outcomes. In this work, we systematically explore contemporary natural language processing techniques, prompt-based large language models (LLMs) and fine-tuned discriminative classifiers, to bridge this critical gap. We evaluate performance on extracting nine key MSH dimensions (timing, duration, efficiency, sleep disorders, daytime sleepiness, interventions, medication, behavior, and satisfaction) from clinical narratives using public datasets (MIMIC-III derivatives) and an internally annotated pediatric sleep corpus. Initially, we assess generative LLM performance using dynamic few-shot prompting, analyzing impacts from varying prompt structures, example quantity, and domain-specificity without explicit task-specific fine-tuning. Subsequently, we fine-tune generative LLM architectures on both in-task and out-of-task data to quantify performance improvements and limitations. Lastly, we benchmark these generative approaches against encoder-based discriminative classifiers (ModernBERT), designed to directly estimate binary presence of each MSH class within full clinical notes. Our experiments demonstrate that fine-tuned discriminative models consistently provide higher classification accuracy, lower inference latency, and more robust span-level identification than either prompted or fine-tuned generative LLMs, given adequate training data. Nonetheless, generative LLMs retain moderate utility in low-data scenarios. Importantly, our results highlight persistent challenges, including difficulty extracting subtle sleep constructs such as sleep efficiency and daytime sleepiness, and biases associated with patient demographics and clinical departments. We conclude by suggesting future research directions: refining span extraction methods, mitigating biases in model performance, and exploring advanced chain-of-thought prompting techniques to achieve reliable, scalable MSH phenotyping within real-world clinical systems.

12
SynLS: A novel diffusion-transformer framework for generating high-quality wearable sensor time series data to enhance health monitoring

Lin, D.; Ji, Y.; McArt, J.; Li, J.

2025-05-15 bioinformatics 10.1101/2025.05.11.653212 medRxiv
Top 0.1%
32.8%
Show abstract

While global medical research is poised to benefit from the rapid advance of artificial intelligence (AI) technologies, veterinary medicine research often faces significant limitations due to data scarcity and availability issues. To address this issue, we proposed a generative modeling framework, SynLS, for generating highly realistic synthetic wearable sensor data. Leveraging diffusion architecture and transformer encoder mechanism, SynLS addressed the intricate challenges posed by these real-world wearable sensor data, including varied length, multiple dimensions, high diversity, high noise, periodicity, and trend. We have validated SynLS on four publicly-available livestock wearables databases with records for three health events (calving, estrus and diseases), and demonstrated its ablility in producing high-fidelity wearable sensor data, which could improve the downstream health events prediction tasks by 18.5% and 26.8% under two evaluation scenarios based on instance and timestamp, respectively. Additionally, introducting raw tri-axial accelerometer databases collected from animals and human further demonstrated extensibility of our framework, significantly enhancing downstream behavior classification tasks by 38.8% and 83.8%, respectively. The technical framework proposed in this work offers a potential generalized solution for data supplementation in wearables sensor databases, with potential applicability across veterinary medicine and other medical domains facing resource constraints.

13
Exploring brain lobe-specific insights in an explainable framework for EEG-based schizophrenia detection

Hossain, M. M.; Tawhid, M. N. A.

2025-10-01 bioinformatics 10.1101/2025.09.29.679358 medRxiv
Top 0.1%
32.5%
Show abstract

Schizophrenia (ScZ) is a growing global health concern that affects millions of people and puts severe pressure on healthcare systems. Early detection and accurate diagnosis are crucial for adequate management. Electroencephalography (EEG) has evolved into a promising non-invasive tool for detecting ScZ in contemporary research. However, specific biomarkers, especially those related to brain lobes, cannot often be identified by current EEG-based diagnostic methods. Different brain lobes are associated with distinct cognitive functions and patterns of diseases. Also, there is a gap in the incorporation of the XAI technique, as medical diagnosis needs trustworthiness and explainability. This study strives to address these gaps by developing a framework using mel-spectrogram images with Convolutional Neural Networks (CNNs). EEG signals are converted into mel-spectrogram images using Short-Time Fourier Transform (STFT). After that, these images are analyzed using a CNN model to perform classification between ScZ and healthy control (HC). To identify the most critical brain regions, the full brain regions are divided into five different regions, and the same classification process is performed. The performance of the proposed framework is evaluated using two publicly available EEG datasets: repOD and the kaggle basic sensory task dataset, which provides a remarkable accuracy of 99.82% and 98.31% respectively. Among regions, the frontal lobe has the most significant performance with an accuracy of 97.02% and 88.03%, respectively, in these datasets, followed by the temporal lobe. Conversely, the occipital lobe shows the lowest accuracy among lobes, with only 79.30 % and 68.33% accuracy on both occasions, showing its lower significance in the diagnosis. To bring result explainability, LIME, SHAP, and the Grad-CAM methods are applied, providing valuable insights for clinicians and researchers. These findings emphasize the potential of EEG-based brain lobe analysis in enhancing ScZ detection, diagnostic accuracy, explainability, and clinical guidance.

14
Detecting and Monitoring Brain Disorders Using Smartphones and Machine Learning

Colbaugh, R.; Glass, K.

2020-10-06 health informatics 10.1101/2020.10.03.20206235 medRxiv
Top 0.1%
28.3%
Show abstract

The ubiquity of smartphones in modern life suggests the possibility to use them to continuously monitor patients, for instance to detect undiagnosed diseases or track treatment progress. Such data collection and analysis may be especially beneficial to patients with i.) mental disorders, as these individuals can experience intermittent symptoms and impaired decision-making, which may impede diagnosis and care-seeking, and ii.) progressive neurological diseases, as real-time monitoring could facilitate earlier diagnosis and more effective treatment. This paper presents a new method of leveraging passively-collected smartphone data and machine learning to detect and monitor brain disorders such as depression and Parkinsons disease. Crucially, the algorithm is able learn accurate, interpretable models from small numbers of labeled examples (i.e., smartphone users for whom sensor data has been gathered and disease status has been determined). Predictive modeling is achieved by learning from both real patient data and synthetic patients constructed via adversarial learning. The proposed approach is shown to outperform state-of-the-art techniques in experiments involving disparate brain disorders and multiple patient datasets.

15
Harnessing consumer wearable digital biomarkers for individualized recognition of postpartum depression using the All of Us Research Program dataset

Hurwitz, E.; Butzin-Dozier, Z.; Master, H.; O'Neil, S. T.; Walden, A.; Holko, M.; Patel, R. C.; Haendel, M. A.

2023-10-14 health informatics 10.1101/2023.10.13.23296965 medRxiv
Top 0.1%
27.7%
Show abstract

Postpartum depression (PPD), afflicting one in seven women, poses a major challenge in maternal health. Existing approaches to detect PPD heavily depend on in-person postpartum visits, leading to cases of the condition being overlooked and untreated. We explored the potential of consumer wearable-derived digital biomarkers for PPD recognition to address this gap. Our study demonstrated that intra-individual machine learning (ML) models developed using these digital biomarkers can discern between pre-pregnancy, pregnancy, postpartum without depression, and postpartum with depression time periods (i.e., PPD diagnosis). When evaluating variable importance, calories burned from the basal metabolic rate (calories BMR) emerged as the digital biomarker most predictive of PPD. To confirm the specificity of our method, we demonstrated that models developed in women without PPD could not accurately classify the PPD-equivalent phase. Prior depression history did not alter model efficacy for PPD recognition. Furthermore, the individualized models demonstrated superior performance compared to a conventional cohort-based model for the detection of PPD, underscoring the effectiveness of our individualized ML approach. This work establishes consumer wearables as a promising avenue for PPD identification. More importantly, it also emphasizes the utility of individualized ML model methodology, potentially transforming early disease detection strategies.

16
Semantic-Aware Energy-Efficient Operation inSmart Capsule Endoscopy

Zoofaghari, M.; Rahaimifard, A.; Chatterjee, S.; Balasingham, I.

2026-03-19 bioinformatics 10.64898/2026.03.17.712375 medRxiv
Top 0.1%
27.4%
Show abstract

Goal-oriented semantic communication has recently emerged in wireless sensor-actuator networks, emphasizing the meaning and relevance of information over raw data delivery, thereby enabling resource-efficient telecommunication. This paradigm offers significant benefits for intra-body or implantable sensor-actuator networks, including dramatic reductions in bandwidth requirements, latency, and power consumption. In this paper, we address a patch-based energy-efficient anomaly detection method for smart capsule endoscopy. We propose a deep learningbased algorithm that employs the similarity between features extracted from measured images and a reference (normal) image as the detection metric. The algorithm is evaluated using a clinical dataset of capsule-captured images, combined with a simulated intra-body channel model. The results demonstrate that even with only 60% of the transmission power (relative to a standard link design for QPSK modulation) and 65% of the light intensity, the probability of anomaly detection remains above 85%, and it gradually improves as power and illumination levels increase. This improvement translates into a potential battery life extension of over 43%. The findings highlight the potential of semanticaware, energy-efficient intra-body devices for more sustainable and effective medical interventions.

17
Improving wearable-based seizure prediction by feature fusion using growing network

Hasija, T.; Kuschel, M.; Jackson, M.; Dailey, S.; Menne, H.; Reinsberger, C.; Vieluf, S.; Loddenkemper, T.

2025-01-29 bioinformatics 10.1101/2025.01.28.635212 medRxiv
Top 0.1%
26.5%
Show abstract

AO_SCPLOWBSTRACTC_SCPLOWThe unpredictability of seizures is one of the most compromising features reported by people with epilepsy. Non-stigmatizing and easy-to-use wearable devices may provide information to predict seizures based on physiological data. We propose a patient-agnostic seizure prediction method that identifies group-level patterns across data from multiple patients. We employ supervised long-short-term networks (LSTMs) and add unsupervised deep canonically correlated autoencoders (DCCAE) and 24-hour patterns using time-of-day information. We fuse features from these three techniques using a growing neural network, allowing incremental learning. Our method with all three features improves prediction accuracy over the baseline LSTM by 7.3%, from 74.4% to 81.7%, averaged across all patients, and outperforms the LSTM in 84% of patients. Compared to the all-at-once fusion, the growing network improves the accuracy by 9.5%. We analyze the impact of preictal data duration, wearable data quality, and clinical variables on the prediction performance.

18
Towards Automated Neonatal EEG Analysis: Multi-Center Validation of a Reliable Deep Learning Pipeline

Hermans, T.; Dereymaeker, A.; Lemmens, K.; Jansen, K.; Usman, F.; Robinson, S.; Naulaers, G.; De Vos, M.; Hartley, C.

2025-10-17 pediatrics 10.1101/2025.10.16.25338113 medRxiv
Top 0.1%
26.2%
Show abstract

ObjectiveTo evaluate the reliability and generalization of NeoNaid, a fully automated software tool for neonatal EEG analysis, based on functional brain age (FBA) estimation and sleep staging. MethodsNeoNaid combines a multi-task deep learning model with proposed quality control routines detecting artefacts, out-of-distribution inputs, and uncertain predictions. Based on a raw EEG input, it outputs one global FBA estimate and a continuous 2-state hypnogram. We validated performance on an two independent hospital settings: an internal dataset (33 EEGs, 17 infants, median 900 minutes/recording) and an external dataset (38 EEGs, 24 infants, median 124 minutes/recording). ResultsQuality control rejected comparable number of segments in the internal and external datasets, reducing extreme errors in FBA estimation, and modestly improving sleep staging accuracy. Across the internal and external data, NeoNaid achieved median absolute FBA errors of 0.50 and 0.55 weeks and Cohens Kappa values of 0.89 and 0.87 for quiet sleep detection, respectively. ConclusionsNeoNaid demonstrated improved reliability through integrated quality control and robust generalization across recording setups. SignificanceBy focusing on validation and trustworthiness, this work takes an essential step toward clinical adoption of automated neonatal EEG analysis and supports its utility for both NICU practice and large-scale research.

19
selfRL: Two-Level Self-Supervised Transformer Representation Learning for Link Prediction of Heterogeneous Biomedical Networks

Wang, X.; Yang, Y.; Liao, X.; Li, K.; Li, F.; Peng, S.

2020-10-21 bioinformatics 10.1101/2020.10.20.347153 medRxiv
Top 0.1%
26.0%
Show abstract

Predicting potential links in heterogeneous biomedical networks (HBNs) can greatly benefit various important biomedical problem. However, the self-supervised representation learning for link prediction in HBNs has been slightly explored in previous researches. Therefore, this study proposes a two-level self-supervised representation learning, namely selfRL, for link prediction in heterogeneous biomedical networks. The meta path detection-based self-supervised learning task is proposed to learn representation vectors that can capture the global-level structure and semantic feature in HBNs. The vertex entity mask-based self-supervised learning mechanism is designed to enhance local association of vertices. Finally, the representations from two tasks are concatenated to generate high-quality representation vectors. The results of link prediction on six datasets show selfRL outperforms 25 state-of-the-art methods. In particular, selfRL reveals great performance with results close to 1 in terms of AUC and AUPR on the NeoDTI-net dataset. In addition, the PubMed publications demonstrate that nine out of ten drugs screened by selfRL can inhibit the cytokine storm in COVID-19 patients. In summary, selfRL provides a general frame-work that develops self-supervised learning tasks with unlabeled data to obtain promising representations for improving link prediction.

20
Integrating Genomics into Multimodal EHR Foundation Models

Amar, J.; Liu, E.; Breschi, A.; Zhang, L.; Kheradpour, P.; Li, S.; Soleymani Lehmann, L.; Giulianelli, A.; Edwards, M.; Nola, D.; Mani, R.; Vats, P.; Tetreault, J.; Chen, T. J.; McLean, C. Y.

2025-10-27 bioinformatics 10.1101/2025.10.26.684668 medRxiv
Top 0.1%
25.9%
Show abstract

AO_SCPLOWBSTRACTC_SCPLOWThis paper introduces an innovative Electronic Health Record (EHR) foundation model that integrates Polygenic Risk Scores (PRS) as a foundational data modality, moving beyond traditional EHR-only approaches to build more holistic health profiles. Leveraging the extensive and diverse data from the All of Us (AoU) Research Program, this multimodal framework aims to learn complex relationships between clinical data and genetic predispositions. The methodology extends advancements in generative AI to the EHR foundation model space, enhancing predictive capabilities and interpretability. Evaluation on AoU data demonstrates the models predictive value for the onset of various conditions, particularly Type 2 Diabetes (T2D), and illustrates the interplay between PRS and EHR data. The work also explores transfer learning for custom classification tasks, showcasing the architectures versatility and efficiency. This approach is pivotal for unlocking new insights into disease prediction, proactive health management, risk stratification, and personalized treatment strategies, laying the groundwork for more personalized, equitable, and actionable real-world evidence generation in healthcare.